Intro to remotemanager¶
remotemanager
is the submission engine developed from the internal job submission system used within PyBigDFT.
The primary focus is helping you run massively parallel calculations on a remote machine, though features exist to allow a full workflow to be managed.
It is written entirely in Python, and implements a strongly object-oriented style. This leans into a high level of modularity, allowing componenents to be replaced or even used individually.
Core Features¶
When using remotemanager
, there are two main objects the user will interact with, Dataset
and URL
.
Dataset¶
Dataset is a container object that is used to store information about your current calculation/workflow.
Calculations are described by defining a Python function, and Dataset
stores this along with any additional data.
URL¶
The URL
object (and derivatives) deal with connecting to the remote machine. These allow specification of all the important aspects of a machine from connection parameters to job submission specifics.
Structure¶
The basic overall structure of a Dataset is as shown below:
A basic overview of the structure of a Dataset
. In this example, the function takes an input name
and returns the string hello {name}
. Runners can be added with the append_run
method of Dataset
, defining the input data.
Note
The Database
within this example is not a “true” searchable database. It exists to checkpoint the current workflow, acting as a safeguard against data loss.
Warning
The database functionality enables the sending and receiving of Datasets. You should only run these from people you trust.
Requirements¶
There are some requirements to be aware of:
A passwordless connection to the remote machine*
python >= 3.7 on the local machine**
python >= 3.5 on the remote machine. (This will depend on the content of your Function, using features from higher python versions also increases this requirement)
A
Linux
based operating system on the remote***
Note
* A passwordless connection can often by set up by way of ssh keys. This is further in the later section.
Note
** Python >=3.9 is recommended if you are using sanzu
.
Note
*** Limited support exists for Windows, but only on the local machine.
License¶
remotemanager
is open source and licensed under the MIT license. See the license page for more info.
Contributing¶
All contributions are welcome! If you spot a bug, issue or have a feature request don’t hesitate to open an issue, or fork the main repo and submit a merge request!
Connecting to a Remote Machine¶
To run your functions on a machine other than your own, it is required that you are able to ssh
into that machine without any further input.
For example, ssh user@remote.address
should put you in a shell on that machine.
If you get a response asking for a password then the quickest solution to this is often to create and copy an ssh key over to the remote using ssh-copy-id.
To sum up the ssh system, these are the basic steps:
Create the ssh-key with
ssh-keygen
Copy that key to your remote with
ssh-copy-id -i ~/.ssh/{key_name} user@remote.host
However, if your remote system incorporates extra security (such as a password in addition to a key), sshpass can still allow functionality.
To do this, start with installing sshpass:
sudo apt install sshpass
Note
MacOS can raise issues when installing sshpass
, citing security issues. There are mirrors which circumvent this block, you should search for a recent one if this is the case.
Then copy your password into a file with 400
permissions.
Warning
When connecting via sshpass it is advised to not input your password into the command directly as this can be captured in your bash history, exposing it. Steps should be taken to minimise the security issues such as storing the password in a “hidden”, nondescript file such as .file
under the proper permissions (400
).
Now we have sshpass installed and our password file, we can connect by using:
sshpass -f <passwordfile> ssh user@remote.host
Added in version 0.3.7.
The inbuilt URL
module has native support for sshpass
files, simply give the passfile
argument the abspath to your file:
url = URL(user = ..., host = ..., passfile = '~/passwordfile')
Added in version 0.5.7.
For an extra layer of separation, you can specify the path to your file within an environment variable with
export SSHPASSFILE='/path/to/.file
Then pass this to passfile
with os.environ['SSHPASSFILE']
Alternatively, you can pass this variable directly to URL
with url=URL(..., envpass='SSHPASSFILE')
Note
An explicit path passed to passfile
will be prioritised over envpass
Similarly, for remotes who have a unique ssh key, the argument keyfile
can be passed to point to that location. This will be added into the ssh call with the format -i {keyfile}